Hi, we talked about gradient descent last time and the problem with gradient descent
was that it tends to exhibit this zigzag behavior, well that may be a bit better than it actually
is, so it does something like that, so zigzag like this, and we are going to talk about
something else which I'm calling orthogonal directions descent.
This is not a method that works in practice, it can't work because it relies on information
that we do not have, but it will still give us the right idea on how to fix this problem
that gradient descent exhibits.
One important piece of information is that gradient descent can in fact converge quickly
if the initial position is set right.
So this is zigzag behavior and that is inefficient.
What's the right initial position?
Of course if we set the initial position at the minimum then we are converged in zero
steps, but there is a way of having converges in one step, and this is done as follows.
If x0 is set such that r0 is an eigenvector of b, then x1, so the position after one step
is identical to the minimum x star.
So this looks like that, these are level sets, so we have zigzag if we start here or here,
sorry you can't see the cursor, if we start here somewhere or here somewhere, but if we
start at let's say at this point, so this is exactly at this half axis of this ellipsoid,
then we go in one step, this bit better, so we start here and then we jump to the minimum
in one step.
So if that is x0, then this is x1.
And similarly if we start here, we also converge in one step.
That's due to the fact that x0 or that r0 is an eigenvector of b, and you can prove
this in the exercises.
It's an elementary computation, it's not super easy to see, but if you do the computation
strike you will see that this is true.
So if we're lucky with our initial guess, gradient descent converges quickly, so in
one step.
But obviously we can't rely on luck, because it is very improbable to exactly choose a
point along this axis or a point along this axis.
Then we might have another idea, which works like this.
Try to replace this zigzag behaviour, so these are supposed to be right angles, by one step
and one step.
So let's say we converge to this point, then we go here.
Sorry, this is also supposed to be a right angle.
So how can we try to do this?
How would we start by replacing this zigzag behaviour by this two-step method?
The idea would be to not stop here, but to go further along this direction and stop at
the right point and then make one step in this direction instead of going along this
direction, the first direction, the second direction, the first direction, the second
direction and so on.
So if we were able to replace these infinitely many steps by just going along this direction
long enough and then along this direction long enough, then we would be exactly at the
point where we are trying to go to get it.
So set d0, d is for direction and we just call this the negative gradient of phi in
x0.
And then set, so we're in two dimensions here, so there are only two directions to choose,
but in n dimensions we of course have d1, dn-1 such that the full set of directions,
including d0, is a basis of orthogonal vectors.
So we can always do that, we choose d0 for example like this and then we just choose
Presenters
Zugänglich über
Offener Zugang
Dauer
00:18:49 Min
Aufnahmedatum
2021-12-14
Hochgeladen am
2021-12-14 10:26:03
Sprache
en-US